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ABSTRACT 



A unified messaging system includes a voice gateway server 
coupled to an electronics mail system and a private branch 
exchange (PBX). The voice gateway server provides voice 
messaging services to a set of subscribers. Within the voice 
gateway server, a trigraph analyzer sequentially examines 
3-character combinations within a text message; determines 
occurrence frequencies for the character combinations; com- 
pares the occurrence frequencies with reference occurrence 
statistics modeled from text samples written in particular 
languages; and generates a language identifier and a likeli- 
hood value for the text message. Based upon the language 
identifier, a message inquiry unit selects an appropriate 
text-to-speech engine for converting the text message into 
computer-generated speech that is played to a subscriber. 

25 Claims, 5 Drawing Sheets 
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UNIFIED MESSAGING SYSTEM WITH Text-to-speech conversion can be very useful within the 

AUTOMATIC LANGUAGE IDENTIFICATION context of unified or integrated messaging systems. In such 

FOR TEXT-TO-SPEECH CONVERSION messaging systems, a voice processing server is coupled to 

an electronic mail system, such that a user's e-mail in-box 
CROSS-REFERENCE TO RELATED 5 provides message notification as well as access to messaging 

APPLICATIONS services for e-mail messages, voice messages, and possibly 

This application is a continuation application filed from ^ypes of messages such as faxes. An example of a 

and claiming priority under 35 U.S.C. § 120 of co-pending umfied messaging system is OctePs Unified Messenger 

non-provisional U.S. patent application Ser. No. 09/099,744, (P^^^^ Communications Corporation, Milpitas, Calif.). Such 

filed on Jun. 18, 1998, entitled "UNIHED MESSAGING systems selectively translate an e-mail message into speech 

SYSTEM WITH AUTOMATIC LANGUAGE IDENTIH- through the use of text-to-speech conversion. A user calUng 

CATION FOR TEXT-TO-SPEECH CONVERSION " i^om a remote telephone can therefore readily listen to both 

which claims priority, under 35 U.S.C. § 119(e) of provi- ^^^^ ^'^^^^ messages. Thus, a unified messaging 

sional U.S. patent application Ser. No. 60/051,720, filed on ^y^^^^ employing text-to-speech conversion eliminates the 

Jul. 3, 1997, and entitled "UNIFIED MESSAGING SYS- need for a user to have direct access to their computer during 

TEM Wrm AUTOMATIC LANGUAGE IDENTIFICA- message retrieval operations, 

TION FOR TEXT-TO-SPEECH CONVERSION." This In many situations, messaging system users can expect to 

application also claims priority under 35 U.S.C. § 120 of receive textual messages written in different languages. For 

co-pending non-provisional U.S. patent application Ser. No. example, a person conducting business in Europe might 
09/479,333, filed on Jan. 7, 2000, entitled "UNIFIED MES- 2° receive e-mail messages written in English, French, or 

SAGING SYSTEM WITH VOICE MESSAGING AND German. To successfiilly convert text into speech within the 

TEXT MESSAGING USING TEXT-TO-SPEECH CON- context of a particular language requires a text-to-speech 

VERSION." application Ser. No. 09/099,744, filed on Jun. engine designed for that language. Thus, to successfully 

18, 1998, application Ser. No. 09/479,333, filed on Jan. 7, convert French text into spoken French requires a text-to- 

2000, and provisional application Ser. No. 60/052,720, filed speech engine designed for the French language, including 

Jul. 3, 1997, are hereby incorporated by reference. In a French-specific phoneme library. Attempting to convert 

addition, this application relates to and incorporates by French text into spoken language through the use of an 

reference U.S. Pat, No. 5,557,659, entitled "ELECTRONIC English text-to-speech engine would likely produce a large 

MAIL SYSTCM HAVING INTEGRATED VOICE MES- amount of uninteUigible output. 

SAGES," In the prior art, messaging systems rely upon a human 

PTPT n nv THP iMvcNnMniNj reader to specify a given text-to-speech engine to be used in 

MELD OF THE INVbNlION converting a message into speech. Alternatively, some sys- 

The present invention relates to systems and methods for tems enable a message originator to specify a language 

voice and text messaging, as well as systems and method for identification code that is sent with the message. Both 

language recognition. More particularly, the present inven- approaches are inefiScient and inconvenient. What is needed 

tion is a communications system that automatically identi- is a messaging system providing automatic written language 

fies a language associated with a text message, and performs identification as a prelude to text-to-speech conversion. 

an appropriate text-to-speech conversion. 

^ ^ SUMMARY OF THE INVENTION 

BACKGROUND OF THE INVENTION ^« The present invention is a unified messaging system 

Computer-based techniques for converting text into providing automatic language identification for the conver- 

speech have become well-known in recent years. Via such sion of textual messages into speech. The unified messaging 

techniques, textual data is translated to audio information by system comprises a voice gateway server coupled to a 
a text-to-speech conversion "engine," which most com- 45 computer network and a Private Branch Exchange (PBX), 

monly comprises software. Examples of text-to-speech soft- The computer network includes a plurality of computers 

ware include Apple Computer's Speech Manager (Apple coupled to a file server, through which computer users 

Computer Corporation, Cupertino, Calif.), and Digital identified in an electronic mail (e-mail) directory exchange 

Equipment Corporation's DECTalk (Digital Equipment messages. The voice gateway server facilitates the exchange 
Corporation, Cambridge, Mass.). In addition to converting 50 of messages between computer users and a telephone 

textual data into speech, such software is responsive to user system, and additionally provides voice messsaging services 

commands for controlling volume, pitch, rate, and other to subscribers, each of whom is preferably a computer user 

speech-related parameters. identified in the e-mail directory. 

A text-to-speech engine generally comprises a text The voice gateway server preferably comprises a voice 
analyzer, a syntax and context analyzer, and a synthesis 55 board, a network interface unit, a processing unit, a data 

module. The text analyzer, in conjunction with the syntax storage unit, and a memory wherein a set of voice messaging 

and context analyzer, utilizes a rule-based index to identify application units; a message buffer; a plurality of text-to- 

fundamental grammatical units within textual data. The speech engines and corresponding phoneme libraries; a 

fundamental grammatical units arc typically word and/or trigraph analyzer; and a set of co recurrence libraries reside, 
phoneme-based, and the rule-based index is correspondingly 60 Each voice messaging application unit comprises program 

referred to as a phoneme library. Those skilled in the art will instructions for providing voice messaging functions such as 

understand that the phoneme library typically includes a call answering, automated attendant, and message store/ 

word-based dictionary for the conversion of orthographic forward operations to voice messaging subscribers, 

data into a phonemic representation. The synthesis module A message inquiry unit directs message playback opera - 
either assembles or generates speech sequences correspond- 65 tions. In response to a subscriber's issuance of a voice 

ing to the identified fundamental grammatical units, and message review request, the message inquiry unit plays the 

plays the speech sequences to a listener. subscriber's voice messages in a conventional manner. In 
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response to a text message review request, the message electronic mail (e-mail) system through which computer 

inquiry unit initiates automatic language identification user can transfer messages as well a message attachments 

operations, followed by a text-to-speech conversion per- between their computers 132 via the file server 134. In an 

formed in accordance with the results of the language exemplary embodiment, Microsoft Exchange™ software 

identification operations. 5 (Microsoft Corporation, Redmond, Wash.) executes upon 

The trigraph analyzer examines a texl sequence, and H',^'=.°°P'"" network 130 to provide such functionality, 

performs language identification operations by first deter- f ^ ^"^'"j^'y 

mining the occurrence frequencies of sequential 3-character T**" T' * ,°^'n°K ' 

,? ... , ^ • *u J . or "m-box, and a network address, m a manner that Will be 

combmationswithm the text, and then comparmg the deter- ^^^^^ understood by those skilled in the art. The voice 

mmed occurrence frequencies with reference occurrence lo ^^^^^^^ ^^^^ facilitates the exchange of messages 

statistics for various languages. The set of reference occur- between the computer network 130 and a telephone system, 

rence statistics associated with a given language are stored Additionally, the voice gateway server 140 provides voice 

together as a corecurrence library. The trigraph analyzer messaging service such as call answering, automated 

determines a closest match between the determined occur- attendant, voice message store and forward, and message 

rence frequencies and a particular corecurrence library, and ^5 inquiry operations lo voice messaging subscribers. In the 

returns a corresponding language identifier and likelihood preferred embodiment, each subscriber is a computer user 

value to the message inquiry unit. identified in the e-mail directory, that is, having a computer 

The message inquiry unit subsequently selects a text-to- 1^2 coupled to the network 130. Those skilled in the art will 

speech engine and an associated phoneme library, and ^cognize that in an alternate embodiment, the voice mes- 

initiates the conversion of the text message into computer- ^^§"^8 subscribers could be a subset of computer users. In 

generated speech that is played to the subscriber in a yet another alternate embodiment, the computer users couW 

„„„ _ . - r oe a subset or a larger pool of voice messaging subscnbers, 

conventional manner. , . , ■ u« u a i u .u . 

which might be useful when the voice gateway server is 

BRIEF DESCRIPTION OF THE DRAWINGS primarily used for call answering. 

25 Referring also now to FIG. 2, a block diagram of a first 

FIG. 1 is a block diagram of a preferred embodiment of and preferred embodiment of a voice gateway server 140 

a unified messaging system constructed in accordance with constructed in accordance with the present invention is 

the present invention. shown. In the preferred embodiment, the voice gateway 

FIG. 2 is a block diagram of a first and preferred embodi- server 140 comprises a voice board 200, a network interface 

ment of a voice server constructed in accordance with the 30 unit 202, a processing unit 204, a data storage unit 206, and 

present invention; a memory 210 wherein a plurahly of voice messaging 

nc. 3 is a flowchart of a first and preferred method for application units 220, 222, 224, 226; a rnessage buffer 230; 

providing automatic language identification for texl-to- " text-to-speech engines 242 243 244 and corre- 

speech conversion in the present invention; T I'll pb°oeme hbranes 252. 253, 254; a tngraph ana- 

r J I- J. r 35 lyzer 260; and a plurality of corecurrence libraries 272, 273, 

HG. 4 IS a block diagram of a second embodm:ent of a 374, 275, 276 reside. Each element within the voice gateway 

voKe _^rver constructed m accordance with the present ^^^^ ^^^^^ ,^ ^ b^,^ 299. The network 

' interface unit 202 is additionally coupled to the network line 

FIG. 5 is a flowchart of a second method for providing 136, and the voice board 200 is coupled to the PBX 120. 

automatic language identification for text-to-speech conver- ^oice board 200 preferably comprises conventional 

sion in the present invention. circuitry that interfaces a computer system with telephone 

DETAILED DESCRIPTION OF THE switching equipment, and provides telephony and voice 

PREFERRED EMBODIMENTS processing functions. The network interface unit 202 pref- 
erably comprises conventional circuitry that manages data 

Referring now to FIG. 1, a block diagram of a preferred 45 transfers between the voice gateway server 140 and the 

embodiment of a unified messaging system 100 constructed computer network 130. In the preferred embodiment, the 

in accordance with the present invention is shown. The processing unit 204 and the data storage unit 206 are also 

unified messaging system 100 comprises a set of telephones conventional. 

110, 112, 114 coupled to a Private Branch Exchange (PBX) The voice messaging appUcation units 220, 222, 224, 226 
120; a computer network 130 comprising a plurality of 50 provide voice messaging services to subscribers, including 
computers 132 coupled to a file server 134 via a network line call answering, automated attendant, and voice message 
136, where the file server 134 is additionally coupled to a store and forward operations. A message inquiry unit 226 
data storage device 138; and a voice gateway server 140 that directs telephone-based message playback operations in 
is coupled to the network line 136, and coupled to the PBX response to a subscriber request. In response to a voice 
120 via a set of telephone lines 142 as well as an integration 55 message review request, the message inquiry unit 226 ini- 
link 144. The PBX 120 is further couple to a telephone liates the retrieval of a voice message associated with the 
network via a collection of trunks 122, 124, 126. The unified subscriber's in-box, followed by the playing of the voice 
messaging system 100 shown in FIG. 1 is equivalent to the message to the user via the telephone in a conventional 
described in U.S. Pat. No. 5,557,659, entitled "Electronic manner. In response to a texl message review request, the 
Mail System Having Integrated Voice Messages," which is 60 message inquiry unit 226 initiates retrieval of a text message 
incorporated herein by reference. Those skilled in the art associated with the subscriber's in-box, followed by auto- 
will recognize that the teachings of the present invention are matic language recognition and text-to-speech conversion 
applicable to essentially any unified or integrated messaging operations, as described in detail below with reference to 
environment. piG. 3. In the preferred embodiment, each voice messaging 
In the present invention, conventional software executing 65 appfication unit 220, 222, 224, 226 comprises program 
upon the computer network 130 provides file transfer instruction sequences that are executable by the processing 
services, group access to software applications, as well as an unit 204. 
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The message buffer 230 comprises a portion of the 
memory 200 reserved for temporarily storing messages 
before or after message exchange with the file server 134. 
The tcxt-lo-speech engines 242, 243, 244, 245, 246 prefer- 
ably comprise conventional software for translating textual 
data into speech. Those skilled in the art will readily 
understand that in an alternate embodiment, one or more 
portions of a text-to-speech engine 242, 243, 244, 245, 246 
could be implemented using hardware. 

The number of text-to-speech engines 242, 243, 244 
resident within the memory 210 at any given time is deter- 
mined according to the language environment in which the 
present invention is employed. In the preferred embodiment, 
the memory 210 includes a text-to-speech engine 242, 243, 
244 for each language within a group of most -commonly 
expected languages. Additional text-to-speech engines 245, 
246 preferably reside upon the data storage unit 206, and are 
loaded into the memory 210 when text-to-speech conversion 
for a language outside the aforementioned group is required, 
as described in detail below. In an exemplary embodiment, 
text-to-speech engines 242, 243, 244 corresponding to 
English, French, and German reside within the memory 210, 
while text-to-speech engines 245, 246 for Portuguese, 
Italian, and/or other languages reside upon the data storage 
unit 206. Those skilled in the art will recognize that in an 
alternate embodiment, the number of text-to-speech engines 
242, 243, 244 resident within the memory could be deter- 
mined according to a memory management technique, such 
as virtual memory methods, where lext-to -speech engines 
242, 243, 244 are conventionally swapped out to the data 
storage unit 206 as required. 

The memory 210 preferably includes a conventional 
phoneme library 252, 253, 254 corresponding to each text- 
to-speech engine 242, 243, 244 residing therein. In the 
preferred embodiment, a phoneme library 255, 256 also 
resides upon the data storage unit 206 for each text-to- 
speech engine 245, 246 stored thereupon. 

The present invention preferably relies upon n-graph 
method for textual language identification, in particular, 
techniques developed by Clive Souter and Gavin Churcher 
at the University of Leeds in the United Kingdom, as 
reported in 1) "Bigram and Trigram Models for Language 
Identification and Classification," Proceedings of the AISB 
Workshop on Computational Linguistics for Speech and 
Handwriting Recognition, University of Leeds, 1994; 2) 
"Natural Language Identification Using Corpus-Based 
Models," Hermes Journal of Linguistics 13: 183-204, 1994; 
and 3) "N-gram Tools for Generic Symbol Processing," M. 
Sc. Thesis of Phil Cave, School of Computer Studies, 
University of Leeds, 1995. 

In n-graph language identification, the occurrence fre- 
quencies of successive n-character combinations within a 
textual message are compared with reference n-character 
occurrence statistics associated with particular languages. 
The reference statistics for any given language are automati- 
cally derived or modeled from text samples taken from that 
language. Herein, the reference n-character occurrence sta- 
tistics for a given language are stored together as a core- 
currence library 272, 273, 274, 275, 276. 

The present invention preferably employs the trigraph 
analyzer 260 and corecurrence libraries 272, 273, 274, 275, 
276 to perform trigraph-based language identification, that 
is, language identification based upon the statistical occur- 
rences of three-letter combinations. In the preferred 
embodiment, the memory 210 includes a corecurrence 
library 272, 273, 274, 275, 276 corresponding to each 
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text-to-speech engine 242, 243, 244, within the memory 210 
as well as each text-to-speech engine 245, 246 stored upon 
the data storage device 206. 

The trigraph analyzer 260 returns a language identifier 

5 and a likelihood or percentage value that indicates relative 
language identification certainty. As developed at the Uni- 
versity of Leeds, the trigraph analyzer 260 is approximately 
100% accurate when textual input comprises at least 175 
characters. The trigraph analyzer 260 additionally maintains 

^0 high language identification accuracy, typically greater than 
90%, for shorter-length text sequences. In an exemplary 
embodiment, the voice gateway server 140 is a personal 
computer having a 200 MHz Intel Pentium"^" Processor 
(Intel Corporation, Santa Clara, Calif.); 128 Megabytes of 

15 Random Access Memory (RAM); an Ethernet-based net- 
work interface unit 202; a Redundant Array of Inexpensive 
Disks (RAID) drive serving as the data storage unit 206; a 
Rhetorex voice board (Rhetorex Corporation, San Jose,, 
Calif.); DECTalk text-to-speech engines 242, 243, 244, 245, 

20 246 and corresponding phoneme libraries 252, 253, 254, 
255, 256 (Digital Equipment Corporation, Cambridge, 
Mass.); the aforementioned trigraph analyzer 260 and asso- 
ciated corecurrence libraries 272, 273, 274, 275, 276 devel- 
oped at the University of Leeds; and voice messaging 

25 application units 220, 222, 224, 226 implemented using 
Octers Unified Messenger software (Octel Communications 
Corporation, Milpitas, Calif.). 

Referring now to FIG. 3, a flowchart of a first and 
preferred method for providing automatic language identi- 
fication for text-to-speech conversion is shown. The pre- 
ferred method begins in step 300 in response to a subscrib- 
er's issuance of a text message review request, with the 
message inquiry unit 226 retrieving a text message from the 
subscriber's in-box, or from a particular data file or folder as 
specified by the subscriber. In the preferred embodiment, the 
subscriber's in-box corresponds to a file server storage 
location, and the retrieved text message is transferred to the 
message buffer 230. Following step 300, the message 
inquiry unit 226 issues an identification directive to the 
trigraph analyzer 260 in step 302, thereby initiating lan- 
guage identification. 

In response to the identification directive, the trigraph 
analyzer 260 examines successive 3-characler combinations 
within the text message currently under consideration, and 
determines occurrence frequencies for the character combi- 
nations in step 304. In the preferred embodiment, the 
trigraph analyzer 260 examines the first 175 characters of 
the text message in the event that the text message is 
suflSciently long; otherwise, the trigraph analyzer 260 exam- 
ines the longest character sequence possible. 

Following the determination of the occurrence frequen- 
cies for the current text message, the trigraph analyzer 260 
compares the occurrence frequencies with the reference 

55 occurrence statistics in each corecurrence library 272, 273, 
274, 275, 276 and determines a closest match with a 
particular corecurrence library 272, 273, 274, 275 in step 
308. Upon determining the closest match, the trigraph 
analyzer 260 returns a language identifier and an associated 

60 likelihood value to the message inquiry unit 226 in step 310. 
Those skilled in the art will recognize that the trigraph 
analyzer 260 could return a set of language identifiers and a 
likelihood value corresponding to each language identifier in 
an alternate embodiment. 

65 As long as the text message is written in a language 
corresponding to one of the corecurrence libraries 272, 273, 
274, 275, 276, the correlation between the occurrence fre- 
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quencies and the reference occurrence statistics is likely to 141 includes a set of conventional text translators 282, 283, 

be sufiBcient for successful language identification. If the text 284, 285, 286, each having an associated word dictionary 

message is written in a language that does not correspond to 292, 293, 294, 295, 296. Those skilled in the art will 

any of the corecurrence libraries 272, 273, 274, 275, 276 understand that the word dictionaries 292, 293, 294, 295, 

present, the correlation will be poor, and a closest match 5 296 are distinct from (i.e., not equivalent to) the phoneme 

cannot be determined. In the event that the likelihood value libraries 252, 253, 254, 255, 256 in content and manner of 

returned by trigraph analyzer 260 is below a minimum ^se, and that each text translator 282, 283, 284, 285, 286 

acceptable threshold (for example, 20%), the message corresponds to a particular target language available for 

inquiry unit 226 plays a corresponding prerecorded message subscriber selection. Text translators 282, 283, 284 and word 

to the subscriber via steps 312 and 318. An exemplary lO dictionaries 292, 293, 294 corresponding to most-common 

prerecorded message could be "language identification subscriber preference selections reside within the memory 

unsuccessful " 210, while those for less-frequently selected languages 

Upon receiving the language identifier and an acceptable ''^^''^f "P°" ^Jjf '^^^j?"^ transferred 

likelihood value, the message inquiry unit 226 selects the T ^ "=q">rf d- Th°se skilled m the art will 

appropriate text-to-speech engine 242. 243, 244. 245, 246 in « ^ embodiment, the text 

step 314. In the event that the text-to-speech engine 244, 245 fo%^o^^aI^dVf '^,""=^P°°*°8 word 

and its associated phoneme library 254, 255 do not pres;ntly dictionaries 292, 293. 294, 295 296 could normaUy reside 

reside within the memory 210, the message inquiry unit 226 Tu V.T I '° T 

transfers the required text-to-speech engine 244, 245 and the '"7°'y ^10 reqiiired during system operation. In 

\;t.„„, ISA i«« f, .1, A., m a" exemplary embodiment, the text translators 282, 283, 

corresponding phoneme library 254, 255 from the data 20 ^ob luc j ^ j- /• -in-» mi tnA -.ni -in^ 
?n£ ^^.v.„,„ •>in 284, 285, 286 and word dictionaries 292, 293, 294, 295, 296 
storage unit 206 mto the memory 210. ,j u • • . j • • .. •. i_i c 
" ' could be implemented using commercially-available soft- 
After step 314, the message inquiry unit 226 issues a wj^e such as that provided by Translation Experts, Ltd. of 
conversion direcUve to the selected text-to-speech engine i^^^ox,, England; or Language Partners International of 
242,243.244,245,246 in step 316, following which the text Evanston 111 

message currently under consideration is converted to ^^Jm^ now to HG. 5, a flowchart of a second method 

speech and played to the subscriber in a conventional p^viding automatic language identification for text-lo- 

manner. Upon completion of step 316. the message inquiry ^ conversion is shown. The second method begins in 

unit 226 determmes whether another text message in the ^r„ rnn ™o „ ^,.Uo««k^,»o v. . «f * ♦ 

J 1 . . M step 500 in response to a subscriber s issuance of a text 

subscriber s m-box, or as specified bv the subscriber, • * *i_ • • 

. . / ^ " „^ \_ 3Q message review request, with the message mquiry unit 226 

requires consideration in step 320. If so, the preferred * • • *u u -u > i c 

^ . , J \JL ' . : H^'^^'^"'^" retrievmg the subscriber's language preference setUngs. 

method proceeds to step 300; otherwise, the preferred kt ^ • * em *u * * 

method ends ^ ' » i' vw^ww ^ ^^^^ ^q^^ ^j^^ message inquuy unit retneves a text 

message from the subscriber's in-box or from a data file or 

In an alternate embodiment, steps 312 and 318 could be data folder as specified by the subscriber, and stores or 

omitted, such that step 310 directly proceeds to step 314 to 3^ copies the retrieved message into the message buffer 230. 

produce a "best guess" text-to-speech conversion played to Following step 501, the message inquiry unit 226 issues an 

the subscriber. In such an alternate embodiment, the mes- identification directive to the trigraph analyzer 260 in step 

sage inquiry unit 226 could 1) disregard the likeUhood 502, thereby initiating language identification. Language 

value; or 2) select the language identifier associated with a identification is preferably performed in steps 504 through 

best likelihood value in the event that multiple language 512 in an analogous manner to that described above in steps 

identifiers and likelihood values are returned. 304 through 312 of FIG. 3. Successful language identifica- 

In the preferred embodiment, textual language identifica- tion results when the trigraph analyzer 260 returns a lan- 

tion is performed, followed by text-to-speech conversion in guage identifier and a likelihood value greater than a mini- 

the appropriate language. This results in the subscriber mum threshold value to the message inquiry unit 226. 

listening to computer-generated speech that matches the 45 Upon receiving a language identifier and an acceptable 

language in which the original text message was written. In likelihood value, the message inquiry unit 226 selects the 

an alternate embodiment, textual language identification appropriate text translator 282, 283, 284, 285, 286 and 

could be performed, followed by text-to-text language con- associated word dictionary 292, 293, 294, 295, 296 and 

version (i.e., translation), followed by text-to-speech con- issues a translation directive in step 514, thereby performing 

version such that the subscriber listens to computer gener- 50 the translation of the current text message into the target 

ated speech in a language with which the subscriber is most language given by the subscriber's language preference 

comfortable. To facilitate this alternate embodiment, a set of setting. Next, in step 516, the message inquiry unit 226 

subscriber language preference selections are stored as user- issues a conversion directive to the text-to-speech engine 

configuration data within a subscriber information database 242, 243, 244, 245, 246 that corresponds to the subscriber's 

or directory. The subscriber information database could 55 language preference settings, causing the conversion of the 

reside within the voice gateway server 140, or it could be translated text message to speech. The speech is preferably 

implemented in association with the file server's e-mail played to the subscriber in a conventional manner. Upon 

directory in a manner those skilled in the art will readily completion of step 516, the message inquiry unit 226 

understand. Additionally, the voice gateway server 140 is determines whether another text message in the subscriber's 

modified to include additional elements, as described in go in-box or as specified by the subscriber requires consider- 

detail hereafter. ation in step 520. If so, the preferred method proceeds to step 

Referring now to FIG. 4, a block diagram of a second 501; otherwise, the preferred method ends. 

embodiment of a voice gateway server 141 constructed in Those skilled in the art will recognize that in the alternate 

accordance with the present invention is shown. Elements embodiment, each word dictionary 292, 293, 294, 295, 296 

common to both FIGS. 2 and 4 are number alike for ease of 65 should include words that may be particular to a give work 

understanding. In addition to having the elemenls shown in environment in which the present invention may be 

FIG. 2, the second embodiment of the voice gateway server employed. For example, use of the alternate embodiment in 
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a computer-related business setting would necessitate word 
dictionaries 292, 293, 294, 295, 296 that include computer- 
related terms to ensure proper translation. In general, the 
first and preferred embodiment of the present invention is 
more robust and flexible than the second embodiment 
because direct conversion of text into speech, without inter- 
mediate text-to-text translation, is not constrained by the 
limitations of a word dictionary and is less susceptible to 
problems arising from word spelling variations. 

From above it can be seen that the present invention is 
related to a unified messaging system and includes a voice 
gateway server coupled to an electronic mail system and a 
private branch exchange (PBX). The voice gateway server 
provides voice messaging services to a set of subscribers. 
Within the voice gateway server, a tri-graph analyzer 
sequentially examines 3 character combinations; compares 
the occurrence frequencies with reference occurrence sta- 
tistics modeled from text samples written in particular 
languages; and generates a language identifier; and a like- 
lihood value for the text message. Based upon the language 
identifier, a message inquiry unit selects an appropriate 
text-to-speech engine for converting the text message into 
computer-generated speech that is played to a subscriber. 

While the present invention has been described with 
reference to certain preferred embodiments, those skilled in 
the art will recognize that various modifications can be 
provided. For example, a language identification tool based 
upon techniques other than n-graph methods could be uti- 
lized instead of the trigraph analyzer 260 and associated 
corecurrence libraries 272, 273, 274, 275, 276. As another 
example, one or more text-to-speech engines 242, 243, 244, 
245, 246 could be implemented via hardware, such as 
through "off-board" text-to-speech engines accessed 
through the xise of remote procedure calls. As yet another 
example, converted speech data or translated text data could 
be stored for future use, which could be useful in a store- 
once, multiple -playback environment. The description 
herein provides for these and other variations upon the 
present invention, which is limited only by the following 
claims. 

We claim: 

1. A method of operating language-based conversion of a 
present text message into speech, the method comprising the 
following steps: 

a. retrieving the present text message; 

b. automatically generating a language identifier corre- 
sponding to the present text message, wherein the step 
of generating includes: 

(1) examining a sequence of characters of the present 
text message; 

(2) determining an actual frequency of occurrence of 
character combinations within the sequence of char- 
acters of the present text message; and 

(3) matching the actual frequency of occurrence of the 
character combinations within the sequence of char- 
acters of the present text message with one of a 
plurality of corecurrence Hbraries; 

c. converting the present text message directly into 
computer-generated speech in a language correspond- 
ing to the language identifier using a language-specific 
text-to-speech engine, wherein the language-specific 
text-to-speech engine is selected according to the lan- 
guage identifier; and 

d. playing the computer generated speech to a subscriber. 

2. U^e method as claimed in claim 1, wherein the step of 
converting is performed by the language-specific text-to- 
speech engine utilizing a phoneme library. 
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3. The method as claimed in claim 1, further comprising 
the following steps: 

a. sensing a subsequent text message; and 

b. repeating the steps of retrieving, generating, 
converting, and playing in response to the step of 
sensing. 

4. The method as claimed in claim 1, wherein the step of 
matching further comprising the steps of: 

a. comparing the actual frequency of occunence with each 
of a plurality of reference frequencies wherein each of 
the plurality of reference frequencies corresponds to 
one of the plurality of corecurrence libraries; and 

b. determining a best match between the actual frequency 
of occurrence and one of the plurality of reference 
frequencies. 

5. The method as claimed in claim 1, wherein the step of 
examining comprises using a trigraph analyzer for inspect- 
ing the character combinations and wherein the character 
combinations comprise three consecutive characters within 
the sequence of characters. 

6. The method as claimed in claim 1, wherein the 
sequence of characters is found in a first portion of the 
present text message. 

7. The method as claimed in claim 1, wherein the step of 
matching further comprising the following steps: 

a. comparing the actual frequency of occurrence with each 
of a plurality of reference frequencies wherein each of 
the plurality of reference frequencies corresponds to 
one of the plurality of corecurrence libraries; and 

b. determining that a sufficient number of matches exist 
between the actual frequency of occurrence and one of 
the plurality of reference frequencies. 

8. The method as claimed in claim 7, wherein the step of 
matching is performed when there is the sufficient number of 
matches between the actual frequency of occurrence and one 
of the plurahty of reference frequencies. 

9. The method as claimed in claim 7, further comprising 
the step of terminating the method when the sufficient 
number of matches does not exist. 

10. A method of providing language-based conversion of 
an original text message into speech for a user comprising 
the following steps: 

a. retrieving the original text message; 

b. automatically generating a language identifier corre- 
sponding to the original text message, wherein the step 
of generating includes: 

(1) examining a sequence of characters of the original 
text message; 

(2) determining an actual frequency of occurrence of 
character combinations within the sequence of char- 
acters; and 

(3) matching the frequency of occurrence of the char- 
acter combinations with one of a plurality of core- 
currence libraries; 

c. automatically selecting an appropriate one text trans- 
lator from a plurality of text translators, wherein each 
of the plurality of translators corresponds to one of a 
plurahty of languages and the appropriate one text 
translator is selected based upon the language identi- 
fier; 

d. translating the original text message into a translated 
text message in a user selected language in response to 
the appropriate text translator; 

c. converting the translated text message into computer 
generated speech based upon the user selected lan- 
guage; and 

f. playing the computer generated speech to the user. 
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11. The method as claimed in claim 10, further comprising 
the following step: polling the user for the user selected 
language. 

12. The method as claimed in claim 10, wherein the step 
of matching further comprising the following steps: 5 

a. comparing the actual frequency of occurrence with each 
of a plurality of reference frequencies wherein each of 
the plurality of reference frequencies corresponds to 
one of the plurality of corecurrence libraries; and 

b. determining that there is a suflBcient number of matches 
between the actual frequency of occurrence and one of 
the plurality of reference frequencies. 

13. The method as claimed in claim 10, wherein the step 
of matching further comprising the following steps: 

a. comparing the actual frequency of occurrence with each 
of a plurality of reference frequencies wherein each of 
the plurality of reference frequencies corresponds to 
one of the plurality of corecurrence libraries; and 

b. determining a best match between the actual frequency 20 
of occurrence and the plurality of reference frequen- 
cies. 

14. A messaging system for converting a text message into 
computer generated speech, the system comprising: 

a. means for storing the text message; 25 

b. means for automatically generating a language identi- 
fier corresponding to the text message wherein the 
means for automatically generating is coupled to the 
means for storing, wherein the means for generating 
includes means for determining an actual frequency of 
occurrence of character combinations within a 
sequence of characters of the text message and means 
for comparing the actual frequency of occurrence of the 
character combinations within the sequence of charac- 
ters with a plurality of reference frequencies wherein 
each reference frequency corresponds to a particular 
corecurrence library; 

c. a plurality of text-to-speech engines coupled to the 
means for storing wherein each of the plurality of 
text-to-speech engines corresponds to one of a plurality 
of languages and an appropriate one lext-to-speech 
engine based on the language identifier converts the 
text message into the computer generated speech; and 

d. means for playing the computer-generated speech to a 45 
subscriber. 

15. The system as claimed in claim 14, further comprises 
a phoneme library coupled to the text to speech engine for 
converting the text message into the computer generated 
speech. 5g 

16. The system as claimed in claim 14, wherein the means 
for automatically generating further comprises a trigraph 
analyzer to formulate an occurrence frequency of the text 
message based on examining a combination of three con- 
secutive characters within the text message. 55 

17. The messaging system as claimed in claim 14, 
wherein the means for comparing the actual frequency of 
occurrence of the character combinations within the 
sequence of characters of the text message further comprises 
means for determining a closest match between the actual 
frequency of occurrence of the character combinations and 
one of the plurality of reference frequencies. 

18. The messaging system as claimed in claim 17 
wherein: 

a. the means for determining the actual frequency of 65 
occurrence of the character combinations further com- 
prises: 



1. means for dividing the sequence of characters of the 
present text messages into a plurality of sequential 
character sets, and 

2. means for determining the actual frequency of occur- 
rence further comprises determining a set of actual 
rates at which each of the plurality of sequential 
character sets occur in the sequence of characters of 
the text message; and 

b. the means for comparing further comprises means for 
matching the set of actual rates with one of the plurality 
of corecurrence libraries. 

19. The messaging system as claimed in claim 18, 
wherein the means for matching the set of actual rates 
includes: 

a. means for comparing the set of actual rates at which 
each of the plurality of sequential character sets occur 
in the sequence of characters of the text message with 
a plurality of reference occurrence frequencies for the 
sequence of characters wherein each reference occur- 
rence frequency corresponds to one of the plurality of 
corecurrence libraries; and 

b. means for determining a closest match between the set 
of actual rates and one of the reference occurrence 
frequencies. 

20. A voice messaging system for providing voice mes- 
saging services to a set of subscribers, the voice messaging 
system comprising: 

a. means for retrieving a text message; 

b. means for automatically generating a language identi- 
fier corresponding to the text message, wherein the 
means for generating includes means for comparing an 
acmal frequency of occurrence of character combina- 
tions within a sequence of characters of the text mes- 
sage with a plurality of reference frequencies wherein 
each reference frequency corresponds to a particular 
corecurrence library; 

c. means for converting the text message directly into 
computer-generated speech, wherein the means for 
converting uses a language-specific text-to-speech 
engine that is selected based upon the language iden- 
tifier; and 

d. means for playing the computer-generated speech to a 
subscriber. 

21. The voice messaging system according to claim 20, 
further comprising a voice gateway server configured to be 
connected to a computer network and a Private Branch 
Exchange, wherein the voice gateway server facilitates the 
exchange of messages between the computer network and 
the Private Branch Exchange. 

22. A method of operating language-based conversion of 
a present text message into speech, the method comprising 
the following steps: 

a. retrieving the present text message; 

b. automatically generating a language identifier corre- 
sponding to the present text message, wherein the step 
of generating includes: 

(1) examining a sequence of characters of the present 
text message; 

(2) determining an actual frequency of occurrence of 
the sequence of characters within the present text 
message; and 

(3) matching the actual frequency of occurrence of the 
sequence of characters with one of a plurality of 
corecurrence hbraries, wherein each of the plurality 
of corecurrence libraries corresponds to a different 
language; 
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c. selecting an appropriate one text-to-speech engine from 
a plurality of text-to-speech engines wherein each of 
the plurality of text-to-speech engines corresponds to 
one of a plurality of languages and the appropriate one 
text-to-speech engine is selected based upon the Ian- 5 
guage identifier; 

d. converting the present text message directly into 
computer-generated speech in response to the appro- 
priate text-to-speech engine; and 

e. playing the computer generated speech to a subscriber. 

23. The method as claimed in claim 22, wherein the step 
f matching the actual frequency of occurrence includes: 

a. comparing the actual frequency of occurrence of the 
sequence of characters within the present text message 
with a plurality of reference occunence frequencies for 
the sequence of characters, wherein each reference 
occurrence frequency corresponds to one of the plural- 
ity of corecurrence libraries, and 

b. determining a closest match between the actual fre- 20 
quency of occurrence of the sequence of characters and 
one of the reference occurrence frequencies. 

24. The method as claimed in claim 22, wherein: 

a. the step of examining the sequence of characters further 
comprises dividing the sequence of characters of the 



present text messages into a plurality of sequential 
character combinations; 

b. the step of determining the actual frequency of occur- 
rence further comprises determining a set of actual 
rates at which each of the plurality of sequential 
character combinations occur in the sequence of char- 
acters of the present text message; and 

c. the step of matching the acmal frequency of occurrence 
further .epmprises matehing the setvof actual, rates with 
one of the plurality of corecurrence libraries. 

25. The method as claimed in claim 24, wherein the step 
of matching the set of actual rates includes: 

a. comparing the set of actual rates at which each of the 
plurahty of sequential character combinations occur in 
the sequence of characters of the present text message 
with a plurality of reference occurrence frequencies for 
the sequence of characters, wherein each reference 
occurrence frequency corresponds to one of the plural- 
ity of corecurrence libraries, and 

b. determining a closest match between the set of actual 
rates and one of the reference occurrence frequencies. 
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